Search CORE

14 research outputs found

Additive Noise Mechanisms for Making Randomized Approximation Algorithms Differentially Private

Author: Tětek Jakub
Publication venue
Publication date: 07/11/2022
Field of study

The exponential increase in the amount of available data makes taking advantage of them without violating users' privacy one of the fundamental problems of computer science for the 21st century. This question has been investigated thoroughly under the framework of differential privacy. However, most of the literature has not focused on settings where the amount of data is so large that we are not able to compute the exact answer in the non-private setting (such as in the streaming setting, sublinear-time setting, etc.). This can often make the use of differential privacy unfeasible in practice. In this paper, we show a general approach for making Monte-Carlo randomized approximation algorithms differentially private. We only need to assume the error

R

of the approximation algorithm is sufficiently concentrated around

0

(e.g.\

E[|R|]

is bounded) and that the function being approximated has a small global sensitivity

\Delta

. First, we show that if the error is subexponential, then the Laplace mechanism with error magnitude proportional to the sum of

\Delta

and the \emph{subexponential diameter} of the error of the algorithm makes the algorithm differentially private. This is true even if the worst-case global sensitivity of the algorithm is large or even infinite. We then introduce a new additive noise mechanism, which we call the zero-symmetric Pareto mechanism. We show that using this mechanism, we can make an algorithm differentially private even if we only assume a bound on the first absolute moment of the error

E[|R|]

. Finally, we use our results to give the first differentially private algorithms for various problems. This includes results for frequency moments, estimating the average degree of a graph in sublinear time, or estimating the size of the maximum matching. Our results raise many new questions; we state multiple open problems

arXiv.org e-Print Archive

Massively Parallel Computation and Sublinear-Time Algorithms for Embedded Planar Graphs

Author: Holm Jacob
Tětek Jakub
Publication venue
Publication date: 19/04/2022
Field of study

While algorithms for planar graphs have received a lot of attention, few papers have focused on the additional power that one gets from assuming an embedding of the graph is available. While in the classic sequential setting, this assumption gives no additional power (as a planar graph can be embedded in linear time), we show that this is far from being the case in other settings. We assume that the embedding is straight-line, but our methods also generalize to non-straight-line embeddings. Specifically, we focus on sublinear-time computation and massively parallel computation (MPC). Our main technical contribution is a sublinear-time algorithm for computing a relaxed version of an

r

-division. We then show how this can be used to estimate Lipschitz additive graph parameters. This includes, for example, the maximum matching, maximum independent set, or the minimum dominating set. We also show how this can be used to solve some property testing problems with respect to the vertex edit distance. In the second part of our paper, we show an MPC algorithm that computes an

r

-division of the input graph. We show how this can be used to solve various classical graph problems with space per machine of

O(n^{2/3+\epsilon})

for some

\epsilon>0

, and while performing

O(1)

rounds. This includes for example approximate shortest paths or the minimum spanning tree. Our results also imply an improved MPC algorithm for Euclidean minimum spanning tree

arXiv.org e-Print Archive

Estimating the Effective Support Size in Constant Query Complexity

Author: Narayanan Shyam
Tětek Jakub
Publication venue
Publication date: 21/11/2022
Field of study

Estimating the support size of a distribution is a well-studied problem in statistics. Motivated by the fact that this problem is highly non-robust (as small perturbations in the distributions can drastically affect the support size) and thus hard to estimate, Goldreich [ECCC 2019] studied the query complexity of estimating the

\epsilon

-\emph{effective support size}

\text{Ess}_\epsilon

of a distribution

{P}

, which is equal to the smallest support size of a distribution that is

\epsilon

-far in total variation distance from

{P}

. In his paper, he shows an algorithm in the dual access setting (where we may both receive random samples and query the sampling probability

p(x)

for any

x

) for a bicriteria approximation, giving an answer in

[\text{Ess}_{(1+\beta)\epsilon},(1+\gamma) \text{Ess}_{\epsilon}]

for some values

\beta, \gamma > 0

. However, his algorithm has either super-constant query complexity in the support size or super-constant approximation ratio

1+\gamma = \omega(1)

. He then asked if this is necessary, or if it is possible to get a constant-factor approximation in a number of queries independent of the support size. We answer his question by showing that not only is complexity independent of

n

possible for

\gamma>0

, but also for

\gamma=0

, that is, that the bicriteria relaxation is not necessary. Specifically, we show an algorithm with query complexity

O(\frac{1}{\beta^3 \epsilon^3})

. That is, for any

0 < \epsilon, \beta < 1

, we output in this complexity a number

\tilde{n} \in [\text{Ess}_{(1+\beta)\epsilon},\text{Ess}_\epsilon]

. We also show that it is possible to solve the approximate version with approximation ratio

1+\gamma

in complexity

O\left(\frac{1}{\beta^2 \epsilon} + \frac{1}{\beta \epsilon \gamma^2}\right)

. Our algorithm is very simple, and has

4

short lines of pseudocode

arXiv.org e-Print Archive

Sampling and Counting Edges via Vertex Accesses

Author: Thorup Mikkel
Tětek Jakub
Publication venue
Publication date: 08/07/2021
Field of study

We consider the problems of sampling and counting edges from a graph on

n

vertices where our basic access is via uniformly sampled vertices. When we have a vertex, we can see its degree, and access its neighbors. Eden and Rosenbaum [SOSA 2018] have shown it is possible to sample an edge

\epsilon

-uniformly in

O(\sqrt{1/\epsilon}\frac{n}{\sqrt{m}})

vertex accesses. Here, we get down to expected

O(\log(1/\epsilon)\frac{n}{\sqrt{m}})

vertex accesses. Next, we consider the problem of sampling

s>1

edges. For this we introduce a model that we call hash-based neighbor access. We show that, w.h.p, we can sample

s

edges exactly uniformly at random, with or without replacement, in

\tilde{O}(\sqrt{s} \frac{n}{\sqrt{m}} + s)

vertex accesses. We present a matching lower bound of

\Omega(\sqrt{s} \frac{n}{\sqrt{m}} + s)

which holds for

\epsilon

-uniform edge multi-sampling with some constant

\epsilon>0

even though our positive result has

\epsilon=0

. We then give an algorithm for edge counting. W.h.p., we count the number of edges to within error

\epsilon

in time

\tilde{O}(\frac{n}{\epsilon\sqrt{m}} + \frac{1}{\epsilon^2})

. When

\epsilon

is not too small (for

\epsilon \geq \frac{\sqrt m}{n}

), we present a near-matching lower-bound of

\Omega(\frac{n}{\epsilon \sqrt{m}})

. In the same range, the previous best upper and lower bounds were polynomially worse in

\epsilon

. Finally, we give an algorithm that instead of hash-based neighbor access uses the more standard pair queries (``are vertices

u

and

v

adjacent''). W.h.p. it returns

1+\epsilon

approximation of the number of edges and runs in expected time

\tilde{O}(\frac{n}{\epsilon \sqrt{m}} + \frac{1}{\epsilon^4})

. This matches our lower bound when

\epsilon

is not too small, specifically for

\epsilon \geq \frac{m^{1/6}}{n^{1/3}}

.Comment: This paper subsumes the arXiv report (arXiv:2009.11178) which only contains the result on sampling one edg

arXiv.org e-Print Archive

Copenhagen University Research Information System

CountSketches, Feature Hashing and the Median of Three

Author: Larsen Kasper Green
Pagh Rasmus
Tětek Jakub
Publication venue
Publication date: 01/01/2021
Field of study

In this paper, we revisit the classic CountSketch method, which is a sparse, random projection that transforms a (high-dimensional) Euclidean vector

v

to a vector of dimension

(2t-1) s

, where

t, s > 0

are integer parameters. It is known that even for

t=1

, a CountSketch allows estimating coordinates of

v

with variance bounded by

\|v\|_2^2/s

. For

t > 1

, the estimator takes the median of

2t-1

independent estimates, and the probability that the estimate is off by more than

2 \|v\|_2/\sqrt{s}

is exponentially small in

t

. This suggests choosing

t

to be logarithmic in a desired inverse failure probability. However, implementations of CountSketch often use a small, constant

t

. Previous work only predicts a constant factor improvement in this setting. Our main contribution is a new analysis of Count-Sketch, showing an improvement in variance to

O(\min\{\|v\|_1^2/s^2,\|v\|_2^2/s\})

when

t > 1

. That is, the variance decreases proportionally to

s^{-2}

, asymptotically for large enough

s

. We also study the variance in the setting where an inner product is to be estimated from two CountSketches. This finding suggests that the Feature Hashing method, which is essentially identical to CountSketch but does not make use of the median estimator, can be made more reliable at a small cost in settings where using a median estimator is possible. We confirm our theoretical findings in experiments and thereby help justify why a small constant number of estimates often suffice in practice. Our improved variance bounds are based on new general theorems about the variance and higher moments of the median of i.i.d. random variables that may be of independent interest

arXiv.org e-Print Archive

Copenhagen University Research Information System